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Abstract. We use a classical combinatorial inequality to establish a Markov 
inequality for multivariate binary Markov processes on trees. We then apply 
this result, alongside with the FKG inequality, to compare the expected loss 
of biodiversity under two models of species extinction. One of these models 
is the generalized version of an earlier model in which extinction is influenced 
by some trait that can be classified into two states and which evolves on a 
tree according to a Markov process. Since more than one trait can affect the 
rates of species extinction, it is reasonable to allow, in the generalized model, 
k binary states that influence extinction rates. We compare this model to one 
that has matching marginal extinction probabilities for each species but for 
which the species extinction events are stochastically independent. 
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1. Introduction 

The concept of a 'Markov process on a tree' generalizes the notion of a Markov 
chain and has been extensively studied in physics, information theory, and evolu- 
tionary biology. In evolution it is used to model the stochastic evolution of traits on 
a phylogenetic tree [5J Q2] . In this work, we establish a generic Markov inequality 
for multivariate Markov processes that consist of k independent but not necessarily 
identical two-state Markov processes on a tree. The inequality has been specifi- 
cally designed for the purpose of comparing a new species extinction model with 
existing ones in conservation biology. This new model is the generalized version 
of the 's-FOB' model [16], in which the extinction risk of a species is associated 
with an underlying state that evolves on an evolutionary tree. In the more general 
setting, extinction is influenced by k independently evolved traits rather than only 
one, giving a more realistic model. 

We compare the expected loss and the variance of 'phylogenetic diversity' under 
this model to the corresponding values of a simpler model in which extinction events 
are treated independently. We show that when extinction events reflect the evolu- 
tionary history of many characteristics, the expected loss of phylogenetic diversity 
is greater than or equal to that predicted under a model with independent extinc- 
tion events. This generalizes the result presented in Section 3 of [IS], and suggests 
that simple models that treat species extinctions independently may systematically 
underestimate the loss of phylogenetic diversity. 

Given this inequality between the expected future phylogenetic diversity under 
these two models we might expect a similar inequality to apply for the variance. 
However, we show that there is no similar relationship between the variances cor- 
responding to the two models. There are examples for which the variance of fu- 
ture phylogenetic diversity under an independent extinction scenario can be either 
smaller or greater than the variance under the model in which extinction events are 
influenced by k characteristics, even for k = 1. 

In the next section, we define the multivariate Markov processes under scrutiny 
and then state and prove the Markov inequality. To demonstrate the phylogenetic 
application, Section [3] presents the inequality between the expected loss of phylo- 
genetic diversity and our findings concerning the variance of future phylogenetic 
diversity. 



2. The Markov inequality 

Let T be a rooted tree with root vertex p and with leaf set X. Consider k 
independent, non-identical two-state Markov processes on T, each of which with 
the state space {0, 1} (for a formal definition of Markov processes on trees, see, for 
example, [H [T21 Q~5]). For each vertex v of T and for j = 1, . . . , k, let denote 
the random state that v is assigned in the jth Markov process. Furthermore, for 
j = 1, . . . , k and for i £ {0, 1}, let be the probability that £j(p) = i. Viewing 
the edges of T as arcs directed away from the root, let P^\r, s) be the transition 
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matrix assigned to arc (r, s) in the jth process. The iZ-entry P^'(r,s)u of this 
2x2 matrix is, by definition, the conditional probability that £j(s) = I given that 
£j( r ) = *• Lor eacn Jj having specified the probabilities tt^ and the transition 
matrices P^'(r, s), i £ {0, 1}, (r, s) s At (the arc set of T), the jth Markov process 
on T is uniquely defined [21 [13l [15] . 

We now combine these k Markov processes into a vector (having jth coordinate 
£j) to provide a multivariate Markov process on T with state space {0,l} fc . In 
this process, each vertex v of T is assigned state £(v) = {£i(v), ■ ■ ■ ,£,k{v))- Let 
i = . . . , ik) 6 {0, l} fc and let Tt\ be the probability that £(p) = i. Then, by the 
independence of the k processes, we get -k\ — Ilj=i ■ Similarly, for the transition 
matrix P(r, s) correspondning to arc (r, s) in the multivariate process, the entry 
P(r,s)i\ in 'row i' and 'column 1' (for i = (ixj • • • ,ife), I = {h,---Jk) S {0, 
becomes Ylj=i P^( r > s )ijlj- This is the conditional probability that £(s) = 1 given 
that £(r) = i. With these, the multivariate Markov process is uniquely defined. 

We will assume throughout that all the ir values are strictly positive and that 
det PU\r,s) > for each arc (r,s) and for each j. Note that this implies that 
det P(r, s) > 0. Namely, it can be seen that P(r, s) is the Kronecker product of the 
k matrices P^(r,s), and so det P(r,s) = (det (r, s) x . . . x det p( fc ) (y^ s )) 2 
(see [5] for the definition and properties of the Kronecker product). However, we 
are neither assuming that any of the k processes are identical, nor that within any 
of them, the arcs are assigned the same transition matrix. 

Consider now a realization U = (Ui, . . . , £4) of £ = (£i, . . . , Note that U 
is a function from V into the set {0, l} k of character states. Let P(U) denote the 
probability that £ = U, that is, the probability that for each v £ V, v is assigned 
U(«). For j = 1, . . . ,k, let 5j(XJ, v) = if the jth coordinate Uj(v) of U(w) is and 
let <5j(U, u) = 1 if C/j-(u) = 1. Also, let 8(TJ, v) denote the state that v is assigned in 
U. Now we are able to express P(U) in terms of the transition matrices and the ir 
values of the multivariate process, using the Markov property (we follow [13]). We 
have: 

P(U) = 7T5(u,p) • ! j P(r,s)s(U,r)S(U,s), 
(r,s)eA T 

which, by the independence of the k two-state processes, gives: 

k k 

(i) pw = rKUr n n pw ( r >«)'icu.r)*icu,.) 

i=l (r,s)e.4 T j = l 

= II PMU.p) II p0) ( r ' S )^(U,r)5 3 (U, S ) • 
i =1 \ (r,s)eA T / 

Recall that a lattice _Sf is a partially ordered set in which any two elements 
ai £ have a unique least upper bound a V b, called their join, and a unique 
greatest lower bound a Ab, which is their meet. A lattice is distributive if aA (6Vc) = 
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(a A b) V (a A c) for all a, 6,c6if or equivalently a V (6 A c) = (a V 6) A (a V c) for 
all a, b, c £ Jzf . 

Let Jzfy be the set of all possible realizations of Let Y,Z £ Jzfy, and let < 
be the partial order over Jzfy in which Y < Z whenever Yj(v) < Zj(v) for each 
vertex v £ V and for each j = 1, . . . , k, and in which Y and Z are incomparable 
otherwise. Clearly, any two elements Y and Z of the partially ordered set (Jzfy, <) 
have a join Y V Z and a meet Y A Z. These are the realizations of £ that, to each 
vertex v £ V, assign state (max{Yi(u), Z\ («)},..., max{Yfc(u), Zf.(v)}) and state 
(min{Fi(«), Z±(v)}, . . . , min{Yk(v), Zk(v)}), respectively. It follows that (Jzfy, V, A) 
is a lattice on Jzfy. It is easy to see that this lattice is distributive. 

Recall that X denotes the leaf set of T and fix a non-empty subset W of X. For 
each function U in Jzfy, define u = (v>i, . . . , Uk) to be the restriction of U to W; 
that is, u — U\w- With this we have u(u) = U(u) for each leaf v in W. Since u is a 
function from the non-empty subset W of X into a set of character states, it is also 
called a character on X [13]. Let J?w be the set that contains, for each U £ Jzfy, 
the restricted function u = \J\w- Let y,z £ Jzfjy, and let < be the partial order 
over Jzfvy such that if yj(v) < Zj(v) for each v £ W and for each j = 1, . . . , k, we 
have y < z; otherwise y and z are incomparable. The join y V z and the meet y A z 
can be obtained for any two elements y,z of J2V analogously to the case of Jzfy, 
denning the finite distributive lattice (Jzfvy, V, A). Now let p(u) be the probability 
that for each leaf v in W, v is assigned u(u). 

This marginal probability is given by: 
(2) p(u) = P ( U )' where ^ : = {U € ify : U|vy = u}. 

U6rf u 

An example to illustrate this concept is provided in Figure 1. 



P 




Figure 1. Let k = 1 and let u be denoted by u. In this example, 
if W = {a,b, c, d}, u(a) = u(c) = 0, and u(b) — u(d) = 1, 
then p(u) = ■7roP(p,a)o P(p,b)oiP{p,s) 00 P(s,c)ooP(s,d)oi + 
n P(p, a) 00 P(p, b) 01 P(p, s) 01 P(s, c) 10 P(s, d) n + 
7TiP(p, a) w P(p, b) n P(p, s) 10 P(s, c) qP(s, d) 01 + 
niP(p, a) w P(p, b)uP(p, s)nP(s, c) w P(s, d) n . 

The following proposition extends a result from [T^] , which dealt with the special 
case k=l. 

Proposition 2.1. Consider k independent two-state Markov processes on a tree 
with leaf set X . Assume that for each of them, all the determinants of the transition 
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matrices are non-negative. Then, for the corresponding multivariate process and for 
any two characters y,z: W — > {0, l} fe on X from a fixed non-empty subset W of 
X , we have: 

p(y) ■ p( z ) ^ p(y v z ) • p(y A z )- 

Proof. Consider any two elements Y and Z of ££y . We first prove the following: 

(3) P(Y) -P(Z) < P(YVZ) -P(YAZ). 

Denote the term in the brackets of equation ([1]) by P/(U) to get P(U) = 
IljU ^i(U). Applying this to U e {Y, Z, Y V Z, Y A Z}, inequality © can be 

written as f[J=i P ^ Y ) U.U P ^ ^ lS=i P ^ Y V Z ) l5=i ^( Y A Z). It is clear 
that proving P j (Y)P j (Z) < P,(Y V Z)P,-(Y A Z) for each j establishes ©. 

So let j be an arbitrary index in {1, . . . , k} and consider the products Pj (Y)Pj (Z) 
and Pj (Y V Z)P, (Y A Z) . These can each be written as a product of two n^) values 
multiplied by a product over the arcs (r, s) of T of two entries of Pw(r, s). The 
products of the two ?r (i) terms agree in P i (Y)P i (Z) and P,(Y V Z)P j (Y A Z), 

that is, t^Jy^'Jz.p) = 4f(YvZ,p)4-(YAZ, P )- The Products of the two P«(r,s) 
entries agree in Pj(Y)Pj(Z) and Pj(Y V Z)Pj(Y A Z), except for the cases in 
which either (i) Sj(Y,r) = 0, Sj(Y,s) = 1, Sj(Z,r) = 1 and 5j(Z,s) = 0, or (ii) 
Sj(Y,r) = 1, <5j(Y, s) = 0, 8j{Z 1 r) = and Sj(Z,s) = 1. However, in both cases 
(i) and (ii), the product pW(r, s) iP (j) ( r . «) 

io appears in the term for Pj(Y)Pj(Z) 
while P^(r,s) 00 P^(r,s) n appears in the term for P,(Y V Z)P,(Y A Z). The 
former term is less than or equal to the second since P^> (r, s)ooP^H r ' s )n ~ 
P^'(r, s)oiP^'(r, s)io = detP^^r, s), which is non-negative by our assumption. 
Consequently, all the terms in Pj(Y)P,(Z) are less than or equal to the corre- 
sponding terms in P, (Y V Z)P, (Y A Z). This establishes ©. 

We now recall a form of the 'four functions theorem', a classical result of Ahlswede 
and Daykin [T]. Let (Jzf, V,A) be a finite distributive lattice and let a be a func- 
tion that assigns a non-negative real number to each element of ££ . For a subset 
stf C Jzf, set a,(stf) = J^Ae&f a {A)- If a satisfies the property that for any two 
elements A, B of «5f , a(A)a(B) < a(A V B)a(A A B), then 

(4) a(si)a(SS) < a(stf V SS)a{si A SS\ 

where ^/Vf = {iVB:ie<Bef} and rfAJ = {4AB:ie.<5e S3}. 

We apply this theorem by taking Jzf = ^fy,a = P and noting that a satisfies 
the required hypothesis by ([3]). Consider any fixed non-empty subset W of X and 
recall the definition (for u e ^w) °f i n @- Note that: 

g/ y V £/ z = ^/yvz, and si y A .k4 = ■c&yAz- 
Thus, taking jz/ = j2/ y and ^ = jz4 in (|4|) we deduce that: 

which is, by a — P and ([2]), equivalent to p(y)p(z) < p{y V z)p(y A z). □ 
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3. Application: Predicting future phylogenetic diversity 

3.1. Expected future phylogenetic diversity. In this section, we use Proposi- 
tion 12.11 to obtain an inequality concerning the expected loss of biodiversity under 
species extinction models. Consider a rooted directed tree T — (Vt,At) in which 
all the arcs are directed away from the root and with leaf set X. Let each arc a in 
At be assigned a non- negative length A a . Here, T represents the evolutionary his- 
tory of the species in X, while A a refers either to the amount of the genetic change 
on arc a, to its temporal duration or to some other feature such as morphological 
diversity. 

Given a subset Y of X, the phylogenetic diversity (PD) of Y, denoted (fy, is the 
sum of the lengths of the arcs of the minimal subtree of T that connects the root 
and the leaves in Y. PD has been widely used to measure the biodiversity of a 
group of species [3J HI [TH [H>] ; informally, the PD-score of a subset Y measures how 
much total 'genetic' or 'evolutionary' diversity in the tree is spanned just by the 
the species in Y (depending on whether the lengths assigned to the edges reflect 
the amount of genetic change or evolutionary time, respectively). 

As a function from 2 X to R-° ip has some attractive properties for the discrete 
mathematician: as well as being a submodular, increasing function it also has the 
property that the subsets of X of given cardinality that have maximal tp value form 
a (strong) greedoid, and so can be quickly constructed by the greedy algorithm (for 
details, see [TT]). 

Assume that species in X undergo random extinction and let E x denote the 
event that a species x £ X is extinct at some fixed future time t. Consider the 
phylogenetic diversity tp of the group of species that are still extant at time t. This 
random variable is referred to as future PD. An example to illustrate this notion is 
given in Figure [2] 




Figure 2. If only the species marked * in the tree on the left 
survive then the future PD is the sum of the lengths of the solid 
edges in the tree on the right. 



The expected value of ip is: 
(5) E[<p] = *»-(l- p ( fl E *))=<Px- E A a -P( P| £ x ), 

a=(u,v)eAr l£C„ o,= (u,v)£At i£C, 

where C v denotes the subset of X which is separated from the root by v and which 
equals {v} if v is a leaf vertex. E[ip] is referred to as expected future PD. 
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In the generalized field of bullets model (g-FOB) [4], the events E x '■= E x are 
independent, and so the probability ^ > (f] x£Cv Ex ) that all the species descended 
from v become extinct can be written as: 

(6) p( n E i 9) ) = n p*. 

i6C„ x£C v 

where p x denotes the probability P(E X 9 ^). 

However, the assumption that the events E x are independent is likely to be 
unrealistic in most settings (see, for example, [HI [14])- 111 particular, rates at which 
lineages become extinct may be influenced by some species traits [101 [6] . The model 
referred to as the state-based field of bullets model (s-FOB) in [16j is based on the 
idea that closely related species in T are more likely to share attributes that may 
put them at risk in a hostile future environment. It assumes that the extinction 
risk of each species is influenced by some associated binary state with values and 
1, where state confers an elevated risk of extinction for example under climate 
change. 

Here, we generalize this model and suppose that the extinction risk of each 
species x is influenced by k binary states, each of which takes values in {0,1}, 
where state 1 is always advantageous over state for x. We suppose that it is 
not known what features will help species survive and so the states are not known 
for the species in X. However, we assume that the k states have evolved under k 
independent Markovian models on T assigning a state in {0, l} fe to each species. 

We assume further that if the states were determined at the leaves, then extinc- 
tion would proceed according to the g-FOB model in which species x is extinct at 
time t with probability p x if it is in state i 6 {0, l} fe . Finally, we suppose that for 
each species x £ X and any two states i = (ii, . . . , ik) and 1 = (Ji, . . . , 

(7) p x < p x whenever lj < ij for each j = 1, . . . , k. 

This condition says that state 1 confers at least as high an extinction risk on a 
species x as state i if all the binary states in i are at least as 'advantageous' for x 
as the binary states in 1. Note, however, that if condition lj < ij is not satisfied 
for every j, there is no prescribed relationship between p x and p x . We have the 
freedom to specify these relationships according to the needs of the model being 
studied, or leave them unspecified. For example, we may assume that the k binary 
states are ordered in a decreasing manner by their importance for survival and that 
P x < P x , whenever lj < ij for the smallest coordinate j G {l,...,fe} for which 
ij lj. Alternatively, we may assume that all the states are equally important 
for survival and that p x < p l x , whenever Ylj=ih — Sj=i H\ ^ na ^ ' lSl i tlic- more 
coordinates of the state assigned to x are 1 the smaller is the extinction probability 
of x. In the following, we only assume the relationships described in (JT]). 

We call the model described above the trait- dependent field of bullets model (t- 
FOB). In the case when k = 1, this model is the s-FOB model, whereas the case 
where for each x, p x — p x for any two states i, 1 S {0, l} k gives the g-FOB model. 
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Given a t-FOB model, consider the g-FOB model in which the extinction prob- 
ability of each species x is the same as in the t-FOB model. That is, if £ describes 
the multivariate Markov process and the values p x are the conditional extinction 
probabilities in the t-FOB model, then, in the associated g-FOB model, each species 
x £ X goes extinct with probability 

(8) p x = P[£&">] = P[i#>] = J2 l£P«M = i), 

ie{o,i} fc 

where E x ^ denotes the event E x under t-FOB. Theorem 13. II compares the loss of 
PD under a t-FOB model with the PD loss under the associated g-FOB model. 

Theorem 3.1. Consider a t-FOB model on a fixed tree T with non-negative arc 
lengths and with leaf set X . The expected future PD of this model is less than or 
equal to the expected future PD of the associated g-FOB model. 



Proof. Let £ and p x denote the Markov process and the extinction probabilities of 
the t-FOB model, respectively. In view of (O and (|6|), it suffices to show that: 

(9) n^^n^)' 

where p x is given in ((8]). Recall how we defined the lattice (JCw, v j A) for a Markov 
process on a tree and for a non-empty subset W of the leaf set of the tree in the 
previous section, and consider (-S?c„) V, A). Since, for u 6 SSc v i p{ u ) denotes the 
probability that for each x £ C v , x is assigned u(x) £ {0, l} fc , we get: 

f ( n e ^ = e p( u ) n /*( u )' 

where f x (u) is the probability that x becomes extinct given that it is in state u(x); 
that is, f x (u) = . Moreover, for each i £ C„, we have: 

p*= E i£p«(z)=i)= E M E p( u )]= E 

ie{0,l} 1 ' ie{0,l} fc \ue.Sf&„:u(x)=i J u£^f c „ 

Now we can rewrite (JSJ) as 

(10) n ( e p( u )/-( u ) ) < e n /«(«)• 



The proof of (jl0|) makes use of Proposition 12 . 1 1 as well as the following multivariate 
form of the FKG inequality of Fortuin, Kasteleyn and Ginibre (1971) [7]. Given a 
finite distributive lattice , V, A), suppose that fx, / 2 , . . . , /„ are functions from 
Jzf into the non-negative real numbers that satisfy, for any two elements A and B 
of .Sf, the condition that: 

(11) A < B => fi(A) > MB). 

Furthermore, suppose that /i is a probability measure on the elements of Jz? which 
satisfies the condition that 

(12) n(A)n(B) < fi{A V B)n{A A B) for any pair A,B 
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Then: 

n / \ n 

(13) n e fWMA)) ^ e ^n*^)- 

i=l \Ae^f / AeJSf i=l 

We apply this inequality by setting Jz? = Jz?c„i A* = P an d /k( u ) = Pa? for 
u e Jz?c„! x 6 Ct,. Note that satisfies (|TT|) . Namely, u < y (for u, y G -£c„) 
means that Uj (x) < j/j (a?) for each coordinate j, which, by ([7]), implies > Px • 
Note also that satisfies (fT2)) by Proposition 2.1. In view of these, (| 13[) provides 
inequality (jTUJ) , and the proof is complete. □ 

3.2. Variance of future PD. Consider now the variance of tp; 

(14) Vax[(p]=Cav[(p,<p]= E A a A 6 Cov[y a , F b ], 

a,beA T 

where Y a is the random variable that takes value 1 if arc a is part of the subtree 
connecting the survival species and the root and takes value otherwise. Our goal is 
to compare the variance under a t-FOB model to the variance under the associated 
g-FOB model. It is easy to find examples in which the former variance is greater 
than the latter and so we will only show that the variance for a t-FOB model can 
be less than that of the associated g-FOB model. To this end, let T be the tree 
with leaf set {x, y} in which the arcs b and c pointing to x and y, respectively, are 
incident with the single interior vertex of the tree, which is adjacent to the root 
by arc a. Consider Cov[Y" a , Y a ] = (1 - F[E X H E y ])¥[E x H E y ], which is written as 

(l-P^^nB^DPl^n^l in t-FOB and which becomes (1 -p x p y )p x p y under 
g-FOB. Note that Cov[y a ,^a] is less under a t-FOB than under the associated g- 
FOB if and only if ¥[Ex t] H E y t] } > p x p y and V[E x t] n E y t] ] + p xPy > 1 hold. It is 
easy to see that these conditions can be satisfied by some t-FOB model (together 
with its g-FOB) on T. Additionally, for any such t-FOB model, a value of A a can 
be chosen that is large enough in relation to Xb and A c so that A^ Cov[y a , Y a ] is the 
dominant term in (|14[) , resulting in a greater total variance for the corresponding 
g-FOB. 

The following example describes an s-FOB (that is, a t-FOB with k = 1) under 
which the variance is less than the variance under the associated g-FOB. 

Example. Let T be the tree shown in Figure [3] with arc lenghts A a = 4 and 
A& = A c = 1 and consider the following s-FOB model on T. Let £ be a two-state 
Markov process on T with the state space {0, 1} so that ttq — tt\ — \ and each 

arc is assigned the transition matrix (ft). Let p x — p y — | and p\ = p y = | . 

A careful check shows that the variance under this model is less than the variance 
under the associated g-FOB model (in which p x = p y = j|). 
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